UMBC High Performance Computing Facility : Usage Policies

This page last changed on Oct 18, 2008 by straha1.

Introduction
Obligation of All Users to Help Maintain the Facility
Access to the Facility
Good User Behaviors

Introduction

This facility is a shared resource for research at UMBC that requires a high-performance, particularly a parallel computer. The following policies intend to help make this facility effective for users and to ensure the maintenance of the facility. For the long-term benefit of everybody, it is vital that all users comply with all aspects of the policies.

These policies are subject to active development at this time, in response to issues that come to our attention and in response to usage patterns. This webpage always shows the current usage policies in effect.

There are several aspects to usage policies on a large computers that is shared by many users and additional aspects for a facility that relies on active support from its users for its maintenance. Therefore, the following items are grouped by their purpose.

If you have any questions or concerns, do not hesitate to contact the chair of the user committee; see the contact information.

Obligation of All Users to Help Maintain the Facility

This machine has been created by financial and ideal support both from faculty and from UMBC. To ensure the long-term existence of this facility, all users have an obligation to help actively to sustain it. This obligation has financial and scientific (non-financial) aspects, and support for both aspects is required from all users to maintain their accounts on the systems. The requirements includes the following methods of support:

Each user must provide a title and abstract for all research projects conducted on the facility's machines. Different projects should have each their own information. This information will be posted on the facility's webpage to demonstrate the uses.
Each user is required to provide information on outcomes of the research conducted on the facility's machines. This includes both information on papers submitted and published and on presentations given. We are happy to post PDF files of papers or presentations on the facility's webpage or point a link to another webpage.
Each user must acknowledge the use of this facility, for instance, in papers and presentations. Proper acknowledgement may use the following sentence: "The computational resources used for this work were provided by the UMBC High Performance Computing Facility at the University of Maryland, Baltimore County (UMBC); see www.umbc.edu/hpc for information on the facility and its uses."
Each user (or the sponsoring PI, if the user is not a faculty member) must be willing to participate as co-PI or co-investigator in future grant proposals. This implies a willingness to supply short descriptions of the research and its results and to provide the necessary information for grant proposals (bio sketch, current/pending support, and similar), when requested.
Each user is required to include budget requests for computational resources in individual grant proposals. The support requested should be commensurate with the amount of resource typically used; the cost per node for contributing users above is a guide for the cost. To support such efforts, we are ready to help with your proposal, including drafting text, acting as co-PI/co-investigator, supplying a support letter, or whatever way is suitable. Contact the chair of the user committee early enough before your proposal due date to work out details.
All users including principal investigators must confirm when requested that they and their research group still require the account on the facility's machines. Specifically, at the beginning of every Fall semester, all accounts will be reviewed to determine if they should be continued. The purpose is to avoid large numbers of inactive accounts. This facility is not suitable for long-term data storage; users are required to move their data off the machine at the completion of projects. An account cannot be kept open solely for the purpose of access to data on the machine.
Users who wish to continue their account on the system are required to supply proofs of outcomes of the usage of the machine, including for instance publications, presentations, preprints, grant proposals including funding requests for nodes on the machine. Users are required to submit such proofs continuously throughout the year, but also specifically at the time of account review at the beginning of the Fall semester. If no information is received upon request or there was no effort to help maintain the facility, the user's account including all accounts sponsored by the faculty member will be suspended and/or their priority of usage reduced. To help with the documentation of research results, we provide at part of this webpage a preprint server where technical reports of results can be posted as well as webpages for each project, where publications and presentations of the research can be posted throughout the year.

The philosophy adopted here is one of granting an account on this facility first and then requiring help in maintaining it, as opposed to requiring up-front payment to use the facility. This approach allows researchers to start using the facility immediately at any point in the year and to obtain initial research results using it. In turn and using these results, it is then necessary for users to actively demonstrate results as well as to search for funding to sustain the facility.

Access to the Facility

This facility is a shared resource for research at UMBC that requires a high-performance parallel computer. To get an account to this facility, please submit an account request form completely filled out. To maintain access, users must follow all policies outlined in the following at all times. To ensure the success of this facility in the long run, it is vital that there be demonstrated research results created on this machine, hence the initial users should have an on-going program of high-performance computing research. As usage patters develop, the machine is fully set up and grows in number of nodes, we anticipate that the number of accounts can be increased. Do not hesitate to contact the chair of the user committee at any time to get an understanding of the current usage levels of the machine and whether space is available or not.

Users are invited to contribute direct funding to the facility at $5,000 per node or a multiple thereof. Contributions from faculty in this way will be bundled and used for a hardware purchase ordinarily once a year. Contributing this money gives these users priority access over other users to that number of nodes, in the sense explained in the following.

The access to compute nodes for users will be managed by a job scheduling software, called scheduler, that reserves compute nodes for users. The scheduler reserves compute nodes based on the availability of resources in combination with a user's priority. The following principles will guide the setup of the scheduler:

The scheduler schedules jobs in a first-come-first-served basis among users with equal priority, assuming availability of requested resources (number of nodes requested, nodes with certain features requested, etc.). Users who contribute funding to the cluster enjoy an increased priority for scheduling their jobs, up to the node-hours as explained below.
Users' jobs are generally limited to no more than 23 node-hours. This means that the product of the number of nodes times the number of hours to run the job is limited to be 23. Additionally, if current usage patterns on the machine allow for it, we are happy to let users run longer or larger jobs by arrangement; contact the chair of the user committee. Example 1: This means that a serial job (1 node used) must complete within 23 hours. Example 2: If 23 nodes are used for the job, the length of the job will be limited to 1 hour.
That guiding principle is that users who contribute funding to the cluster have the right to use their number of nodes for 23 hours per day without a time limitation. The remaining hour of each day is reserved for running jobs that require larger number of nodes than available otherwise, if there are such requests in the queue. The scheduler will be set up to pause all jobs and restart them within an hour or after all requests for large numbers of nodes are satified, whichever comes first.
In practice, the right to use their number of nodes for 23 hours per day for users who contribute funding to the cluster is implemented by giving them an allotment of node-hours each month that is equal to their number of nodes multiplied by 23 times the number of days in a month (using for simplicity 30 days for all months). This means that such users can either request their number of nodes for 23 hours on all days of the month, or can choose to request more nodes than theirs and use them for the time available until the allotment runs out. Naturally, users can continue to submit jobs beyond the node-hours allotted. The point is that these users enjoy a priority in scheduling up to the point of using up their node-hours in each month, after which they do not enjoy a priority.
The above rules do not apply to system administration and testing of the machine, including select users running jobs for the purpose of testing, debugging, or benchmarking the system. For instance, users with existing code may be specifically running large jobs to test the new system; that is, it is not just the actual system administrator running such jobs. Such efforts will be coordinated by the chair of the user committee in collaboration with OIT and the user committee. We anticipate that such activity is limited to the initial phase of the machine or after significant changes in, e.g., hardware or software.

It is noted explicitly that the above usage policies are subject to modification, if it turns out that it is impossible to implement certain features in the scheduling software. In any case, it is hoped that the number of users will be reasonably small initially and be in frequent communication with each other to coordinate the scheduling of jobs in a cooperative way, until usage patters become clear and the setup of the scheduler is improved. This is stated in the spirit that setting up the scheduler is in fact not the initial priority for the administration of the cluster; rather, proper testing, debugging, benchmarking, optimizing performance of the hardware and software, and initial results are more important priorities for the first months of the machine.

Good User Behaviors

On a day-to-day basis, it is imperative that users run their code in a responsible fashion, so as not to hinder or damage other users' code. To this end, the following rules must be followed at all times. To comply with many of these common sense rules might require you to understand something about parallel computing or about the setup of the hardware and software of the machine. Do not hesitate to contact the chair of the user committee as point-of-contact to ask questions or to report potential problems.

All users must use the batch submission system of the scheduler that is running on the user node to reserve compute nodes for their use. You are not allowed to log in to the compute nodes for the purpose of running job directly there. For certain purposes, interactive use of a node may be necessary; if you need this, please contact the chair of the user committee. Only users who need this and who have received appropriate training are allowed interactive use of any compute node.
Users will be notified by e-mail about issues related to the system, such as scheduled downtime, upgrades, etc. Such mail may also include requests for information and feedback. Users are required to monitor the e-mail address on file with the chair of the user committee and are required to respond to contacts. This is part of the active communication necessary for a shared resource such as this to be used effectively by all users. Currently, the time slot for scheduled downtime is every Tuesday evening. This downtime window may not be used every week, but users should plan their work accordingly. An effort will be made to send out an announcement to inform of an actually scheduled downtime, but users should not rely on such notices.

IMPORTANT NOTE: If users are observed to violate any of the above rules or are behaving in any way that impacts other users' ability to use the resource, the chair of the user committee has the right to terminate the users jobs and/or to suspend the user's account. Ordinarily, we will try to make contact with the user first to discuss what is going on and to try to work with the user, but if other users are impacted, the account can be suspended first. Decisions by the chair of the user committee are subject to review by the user committee; see the contact information for a list of the members of the user committee.

Document generated by Confluence on Mar 31, 2011 15:37

Table of Contents

Introduction

Obligation of All Users to Help Maintain the Facility

Access to the Facility

Good User Behaviors